Shape Context Matching For Efficient OCR
نویسنده
چکیده
منابع مشابه
A Duplicate Chinese Document Image Retrieval System
An optical character recognition (OCR) system enables a user to feed an article directly into an electronic computer file and translate the optically scanned bitmaps of text characters into machine-readable codes; that is, ASCII, Chinese GB, as well as Big5 codes, and then edits it by using a word processor. OCR is hence being employed by libraries to digitize and preserve their holdings. Billi...
متن کاملContext shapes: Efficient complementary shape matching for protein-protein docking.
We describe an efficient method for partial complementary shape matching for use in rigid protein-protein docking. The local shape features of a protein are represented using boolean data structures called Context Shapes. The relative orientations of the receptor and ligand surfaces are searched using precalculated lookup tables. Energetic quantities are derived from shape complementarity and b...
متن کاملPattern matching techniques for correcting low-confidence OCR words in a known context
A commercial OCR system is a key component of a system developed at the National Library of Medicine for the automated extraction of bibliographic fields from biomedical journals. This 5-engine OCR system, while exhibiting high performance overall, does not reliably convert very small characters, especially those that are in italics. As a result, the “affiliations” field that typically contains...
متن کاملCS540 Machine Learning Clustering of Typeset Mathematical Symbols Using Spectral Methods and Shape Contexts
Optical character recognition (OCR) of natural languages, both typeset and handwritten, is successfully used today in a wide range of applications. OCR of mathematical expressions and mathematical symbols is not yet as advanced, however. This project demonstrates a method for recognising typeset mathematical symbols. The method involves using spectral methods to perform semi-supervised clusteri...
متن کاملJapanese OCR Error Correction using Character Shape Similarity and Statistical Language Model
We present a novel OCR error correction method for languages without word delimiters that have a large character set, such as Japanese and Chinese. It consists of a statistical OCR model, an approximate word matching method using character shape similarity, and a word segmentation algorithm using a statistical language model. By using a statistical OCR model and character shape similarity, the ...
متن کامل